Skip to content

Conversation

Copy link

Copilot AI commented Dec 3, 2025

The repo was maintaining a patched fork of llama.cpp's server, requiring manual patch maintenance on each llama.cpp bump. Vanilla llama-server already supports /v1/models during model loading.

Changes

  • CMakeLists.txt: Enable LLAMA_BUILD_SERVER and LLAMA_BUILD_TOOLS to build upstream llama-server, then copy to com.docker.llama-server for binary name compatibility
  • Removed llamacpp/native/src/server/: Deleted forked server.cpp, utils.hpp, httplib.h, server.patch, and build files
  • Updated README.md: Simplified llama.cpp version bump instructions (no more patching)
# Build vanilla llama-server from submodule
set(LLAMA_BUILD_COMMON ON CACHE BOOL "" FORCE)
set(LLAMA_BUILD_TOOLS ON CACHE BOOL "" FORCE)
set(LLAMA_BUILD_SERVER ON CACHE BOOL "" FORCE)
add_subdirectory(vendor/llama.cpp)

# Rename for backwards compatibility
add_custom_target(docker-llama-server ALL
    DEPENDS llama-server
    COMMAND ${CMAKE_COMMAND} -E copy
        $<TARGET_FILE:llama-server>
        ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${SERVER_OUTPUT_NAME}
)
Original prompt

This section details on the original issue you should resolve

<issue_title>Unfork llama.cpp</issue_title>
<issue_description>We should be able to use the code, cmake files, etc. from llama.cpp as is:

docker#471</issue_description>

Comments on the Issue (you are @copilot in this section)

@ericcurtin @doringeman you seem like you are in the know here, I can do this to vanilla llama-server (which we should be using) no problem, what is difference about the fork that we cannot do similar?
$ curl -k http://127.0.0.1:8080/v1/models
{"models":[{"name":"unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M","model":"unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M","modified_at":"","size":"","digest":"","type":"model","description":"","tags":[""],"capabilities":["completion"],"parameters":"","details":{"parent_model":"","format":"gguf","family":"","families":[""],"parameter_size":"","quantization_level":""}}],"object":"list","data":[{"id":"unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M","object":"model","created":1764770422,"owned_by":"llamacpp","meta":{"vocab_type":2,"n_vocab":151936,"n_ctx_train":262144,"n_embd":2048,"n_params":30532122624,"size":18550716416}}]}

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Unfork llama.cpp to enable usage as is Unfork llama.cpp: use vanilla llama-server directly Dec 3, 2025
Copilot AI requested a review from ericcurtin December 3, 2025 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unfork llama.cpp

2 participants